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Abstract 

The broadcast phase (downlink transmission) of the two-way relay network is studied in the source 
coding and joint source-channel coding settings. The rates needed for reliable communication are char- 
acterised for a number of special cases including: small distortions, deterministic distortion measures, 
and jointly Gaussian sources with quadratic distortion measures. The broadcast problem is also studied 
with common-reconstruction decoding constraints, and the rates needed for reliable communication are 
characterised for all discrete memoryless sources and per- letter distortion measures. 
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I. Introduction 

Consider the two-way relay network shown in Figure [j] User 1 requires an approximate copy X of 
the data X from user 2, and user 2 requires an approximate copy Y of the data Y from user 1. The 
users are physically separated and direct communication is not possible. Instead, indirect communication 
is achieved via a relay and a two-phase communication protocol. In phase 1 (uplink transmission), each 
user encodes its data to a codeword that is transmitted over a multiple access channel to the relay. In 
phase 2 (downlink transmission), the relay completely or partly decodes the noise-corrupted codewords 
it receives from the multiple access channel, and it transmits a new codeword over a broadcast channel 
to both users. From this broadcast transmission, user 1 decodes X and user 2 decodes Y. 
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(a) Phase 1 (uplink) (b) Phase 2 (downlink) 

Fig. 1. The two-way relay network: user 1 has Y and requires a copy X of X from user 2; similarly, user 2 has X and 



requires a copy Y of Y from user 1. Figure 1(a) depicts the uplink and Figure 1(b) depicts the downlink. 



In this paper, we study the downlink for the case where X and Y have been perfectly decoded by 
the relay after the uplink transmission (Figure pi). We are interested in the lossy setting where X and Y 



need to satisfy average distortion constraints. We have a source coding problem (Figure 2(a)) when the 
broadcast channel is noiseless, and we have a joint source-channel coding problem when the broadcast 
channel is noisy (Figure 2(b) I. In Figure [2] we have relabelled the relay as the transmitter, user 1 as 
receiver 1 and user 2 as receiver 2. We note that the source coding problem is a special case of the joint 
source-channel coding problem; however, we will present each problem separately for clarity. 

It is worthwhile to briefly discuss some of the implicit assumptions in the two-way relay network setup. 
The no direct communication assumption has been adopted by many authors including Oechtering, et 
al. |TJ, Q, Giindiiz, Tuncel and Nayak [3] as well as Wyner, Wolf and Willems [4]. It is appropriate 
when the users are separated by a vast physical distance and communication is via a satellite. It is 
also appropriate when direct communication is prevented by practical system considerations. In cellular 
networks, for example, two mobile phones located within the same cell will communicate with each other 
via their local base-station. We note that this assumption differs from Shannon's classic formulation of the 
two-way communication problem ||5j, (6}. Specifically, those works assume that the users exchange data 
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directly over a discrete memoryless channel without using a relay. The two-phase communication protocol 
assumption (uplink and downlink) is appropriate when the users and relay cannot transmit and receive at 
the same time on the same channel |TJ, |7J. This again contrasts to Shannon's two-way communication 
problem ||5j as well as Giindiiz, Tuncel and Nayak's separated relay J3J, where simultaneous transmission 
and reception is permitted. Finally, this relay network is restricted in the sense that it does not permit 
feedback (5J; that is, each user cannot use previously decoded data when encoding new data. 
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Fig. 2. Lossy broadcasting in two-way relay networks. The source coding and joint source-channel coding problems are shown 
in Figures [l (a)| and \l (b)| respectively. 



Notation: The non-negative real numbers are written M + . Random variables and random vectors are 
identified by uppercase and bolded uppercase letters, respectively. The alphabet of a random variable 
is identified by matching calligraphic typeface, and a generic element of an alphabet is identified by a 
matching lowercase letter. For example, X represent a random variable that takes values x from a finite 
alphabet X , and X = X\,X<i, . . . , X n denotes a vector of random variables with each taking values 
from 36 '. The length of a random vector will be clear from context. The n-fold Cartesian product of a 
single set is identified by a superscript n. For example, 3£ n is the n-fold product of 3C . 

Paper Outline: In Section |n| we formally state the problem and review some basic RD functions. We 
present our main results in Section |in| and we prove these results in Sections IV and [V] The paper is 
concluded in Section [VI] 



II. Formal Problem Statement & Definitions 

Let X, St, W and & be finite alphabets, and let qxY{x,y) = Pi[X = x,Y = y] be a generic 
probability mass function (pmf) oni"xf. The source coding and joint source-channel coding problems 
are defined next. 
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A. Source Coding 

Assume that (X, Y) = (Xi, Y\), (X2, Y2), . ■ ■ , (X n , Y n ) is drawn independent and identically dis- 
tributed (iid) according to qxy{x,y)- A rate-distortion (RD) blockcode is a triple of mappings (f^ n \ 

a[ n \ 92 )' wnere 

f(n) . a^n x _^ j^{n) ^ (la) 

g^ : .Ji^ ^ £ n and (lb) 

: Jt^ x5r n ^#V (lc) 

Here denotes the encoder at the transmitter and gf 1 ^ 1 denotes the decoder at receiver i = 1,2, see 



Figure 3(a) The compression rate of an RD code (f^ n \ g[ , ) is defined by 

«w 4 IiogJ^rwi , 



(2) 



where |^#( n )| denotes the cardinality of ^^ a \ We use the braced superscript (n) to emphasize that a 
blockcode of length n is under consideration. 
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Fig. 3. Figure (a): Encoder and decoder structure for source coding at rate R(di, d,2). Figure (b): Encoder and decoder structure 
for source coding with common reconstructions at rate RcR{d\,d,2). 



The reconstruction quality of the decoded data is quantified in the usual way via average per-letter 
distortions. To this end, we let 



St : X x SC -> [0, di )max ] and 
S 2: & x # [ ,d 2imax ] 



(3a) 
(3b) 
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be bounded per-letter distortion measures. To simplify our presentation, we assume that <5i and 62 are 
normal JiJ. That is, for all x G S€ we have 5i(x,x) = for some x G 36. Similarly, for all y G & 
we have ^(y, y) = for some y G & . This assumption is not too restrictive, and our results can be 
extended to more general distortion measures |8J. We call Si a Hamming distortion measure if 36 = 56 , 
5i(x,x) = for x = x and 5\{x,x) = 1 for x 7^ x. We call 5\ a difference distortion measure 091 if 
it can be written in the form 5\{x — x), where 36 = 56 = {0, 1, . . . , l x — 1} and the subtraction is 
performed modulo-^. The same naming convention applies to 82- 

The average average distortions (A^ n \ A| ) of an RD code (f( n \ g^\ g%) are defined by 



A< n) 4 E 



A^ n) 4 E 



1 n 

->J5i(Xj,Xj) 
n, z — ' 

1 n 



i=l 



(4a) 



(4b) 



where X = gf'(M, Y), Y = ^ nj (M,X), M = /^(X, Y), and E[-] denotes the expectation operator. 

Definition 1 (Source Coding): Let (di,do) G M+. A rate r G M+ is said to be (d\, (I2) -achievable if 
for arbitrary e > there exists an RD code (f^ n \ g[ n \ g^) for some sufficiently large n with 



< r + e , 
M n) <d t + e 



and 



1,2 . 



Let M(d\,d2) denote the set of all (d\, c^-admissible rates, and let 

R(di,d 2 ) A 



mm r . 



(5a) 
(5b) 

(6) 



Definition [l] does not require that the two receivers agree on the exact realizations of X and Y. For 
example, receiver 1 need not know the exact realization of Y. In some scenario^] it is appropriate that the 
receivers exactly agree on X and Y. The notion of common reconstructions is useful for such scenarios. 

A common-reconstructions rate-distortion (CR-RD) code is a tuple of mappings (f^ n \ g[ n \ <?2 » 9i » 



f>2 )> where /( n ) and g^ are given by ([1} and 



W n and 



4 n) :ixr^r. 



(7a) 
(7b) 



Here <pi denotes the "common-reconstruction" decoder at receiver i = 1,2, see Figure 3(b) 



'Examples of such problems can be found in Steinberg's work jlo| on common reconstructions for the Wyner-Ziv problem. 
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The rate and average distortion (Aj n) , Ajj n) ) of a CR-RD code are denned in the same manner 
as ([2]) and Q. Additionally, we define the average probability of common-reconstruction decoding error 
by 

P e ± max { Pr[X + X], Pr[Y + Y]} , (8) 

where Y = <^ n) (M, Y) and X = <^ n) (M,X). 

Definition 2 (Source Coding with Common Reconstructions): Let (d^tfc) G M+. A rate r S IR + is 
said to be (d\, (I2)- achievable with common reconstructions if for arbitrary e > there exists a CR-RD 
code (/H 5 { n) , ^ n) , (/>i n) , 4 n) ) with («("), , A^ } ) satisfying © and P e < e. Let ^or^i, <k) 
denote the set of all (d\, d 2 ) -admissible rates with common reconstructions, and let 

RcR(di,d 2 ) = min r . (9) 
The next proposition follows directly from Definitions [T] and [2j 

Proposition 1: The RD function i2(di, d 2 ) and the CR-RD function RcR(d\, d 2 ) are continuous, non- 
increasing and convex on M^. Moreover, for (d\,d 2 ) G we have that 

R(d!,d 2 ) <Rcn(di,d 2 ) . (10) 

General Remark: The common-reconstruction condition used in this paper was inspired by Steinberg's 



study [ 10 1 of common reconstructions for the Wyner-Ziv problem. 



B. Joint Source-Channel Coding 

Consider the joint source-channel coding problem. Suppose that the source qxy emits symbols at the 
rate k s , and that the channel accepts and emits symbols at the rate k c . Let W denote the channel input 
alphabet, let ^ x f denote the product of the channel output alphabets, and let the transitions from W 
to x "V be governed by the conditional pmf quv\w( u i v\w) = Pt[U = u, V = v\W = w]. The ratio 
of channel symbols to source symbols, 

«=-, (ID 

K s 

is called the bandwidth expansion. In the sequel, k s and k c are arbitrary fixed constants. 

A joint source-channel (JSC) blockcode of length t, with n s t and K c t being integers, is a triple of 
mappings (/(*), gf\ g®). Here 

f(t) . X K,t x ^K 3 t _^ W K c t (12a) 

denotes the encoder at the transmitter, and 

gf : x £ K ° l and (12b) 

gf : r Kct x & K ° l -»• . (12c) 
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denotes the decoder at receiver i = 1,2. 

A common-reconstruction joint source-channel (CR-JSC) blockcode is a tuple of mappings (f®, g[ , 
g% , <fri , 02 )> where an( l 9i are defined in (12) and 



: W Kj x <& Kst -»• and (13a) 

4* } : y Kct x JT K ^ -». JT^* . (13b) 

Here denotes the "common-reconstruction" decoder at receiver i = 1,2. 

The average distortions (A^ st) , A^ st) ) of JSC and CR-JSC codes are defined by <(4a|» and gbj), where 
k s * replaces n in the sum, and we set X = c/f } (U,Y), Y = g$\v,X) and W = /W(X,Y). The 
probability law of U and V is defined by the discrete memoryless broadcast channel 

?cJv[w( u > v l w ) = Yl ( luv\w(ui,Vi\w i ) . 

1=1 

For the CR-JSC code, the probability of common-reconstruction decoding error P e is defined by ([8]), 
where Y = (f>f\lJ,Y) and X = 4' } (V,X). 

Definition 3 (Joint Source-Channel Coding): A distortion pair (di,d,2) G is said to be achievable 
with bandwidth expansion k if for every e > there exists a joint source-channel code (f^\ gf \ g^) 
for some sufficiently large t with 

Af st) < di + e , i = 1,2. (14) 

Definition 4 (Joint Source-Channel Coding with Common-Reconstructions): A distortion pair (di, cfo) 
G is said to be achievable with CR and bandwidth expansion k if for every e > there exists a 
CR-JSC code (/(*), gf\ g®, <$\ (f>f ) for some sufficiently large t with (A[ Kat) , A^ a ' } ) satisfying ((H) 
and P e < e. 

C. Basic Rate-Distortion Functions 

In this section, we briefly review some rate-distortion functions that will be used frequently throughout 
the paper. Let 

Qx(x) = X qxy(x,y) , x G X , (15) 

denote the X-marginal of qxy- (This notation will be extended to all marginal pmfs.) Let ^ x \x^ 1 ) 
denote the set of channels v±\x ma PPi n g to Sfc such that 

X P x \x^\ x )lx(x)5 1 (x,x) < d 1 . (16) 

(x,x)estx& 
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Definition 5 (RD Function): For d\ G M + , the RD function of X is denned by pT| Chap. 10] 

R X (di)= min I(X;X) . (17) 

Vx\x^x\x{ d i) 

Let &jj-f-, X Y(di,d2) denote the set of channels P X y\xy ma Ppi n g x ^ to x ^ such that 

X Pxy|xy(^'yl x 'y)^^( x 'y)' 5 2(y 5 y) < d 2 ■ (18b) 
Definition 6 (Joint RD Function): For (di,^) G R+, the joint RD function of X and Y is defined 



by pj 



R X Y(d u d 2 )± min (19) 

Piy|xr£^XYixy (^1 ><^a) 



Let 3? x] ^ XY (di) denote the set of all channels mapping J x f to such that 

X Pxixy^'^^H^yMif^'^) < di • (20) 



x,y,x 



Definition 7 (Conditional RD Function [12]): For d\ G R + , the conditional RD function of X given 
Y is defined by 

i? X |y(di)= min I(X;X\Y). (21) 

Px I X Y € ^jf | X Y ( ^1 ) 

Let si be finite set of cardinality \si\ < \3£\ + 1. Let & x *y{di) denote the set of pmfs paxy on 
si x x <3i such that: 

^PAxy(a,x,y) = q X Y(x,y) , xf , (22) 

a 

A-e- X -e-Y forms a Markov chain, and there exists a function tti : si x W — > such that 

X PAxy(a,x,y)5i(x,7ri(a,y)) < di . (23) 

(a,x,y) 

Definition 8 (Wyner-Ziv RD Function): For d\ G R+, the Wyner-Ziv RD function for X given Y is 



defined by 1 13 1 



R^(di)= win J(X;A\Y) . (24) 



The final function that we will need to define is the minimax (or, worst noise) capacity C%(d\). This 
function was used by Zamir in [9] to bound the rate loss in the Wyner-Ziv problem. We shall use it in a 
similar manner to approximate R(d\,d 2 ). Before defining C%-(d\), we first need to define the capacity 
of an additive channel with an input distortion constraint. 
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Definition 9: Let N be a random variable that takes values from 3£ = {0,1, . . . ,l x }, and let pjy 
denote its pmf. Consider the additive-noise channel that randomly maps 3£ to 3£ via x \— > x © N. 
I.e., consider N to be modulo-/^ additive noise. The capacity of this channel (with an input distortion 
constraint d\) is defined by 

C^idi , N) = sup I(W; W N) , (25) 
w 

where the supremum is taken over all choices of a random variable W (defined on X with pmf pw and 
independent of N) for which 

22 Pw{x)h{x) < di . (26) 

Definition 10: The minimax (worst noise) capacity under distortion constraint d\ is defined by (9j 

C x {d x )±vtiC<t{d x ,N) , (27) 
where the infimum is taken over all choices of a "noise" random variable N such that 

Pn(x)S 1 (x) < d! (28) 

x&SC 

III. Main Results 

A. Main Results for Source Coding 

Our first result is a single-letter characterisation of RcR{d\,d2) for arbitrary sources and distortion 
measures. For (d 1 ,d 2 ) G R%, define 

R* CR {di,d 2 )= min max \l(X; X, Y\Y), I(Y;X,Y \X)\ , (29) 

where <^>^y^ XY (di, d 2 ) is defined in Section II-C The next result is proved in Section 



IV-C 



Theorem 1: For (di,d 2 ) £ M+, the CR-RD function is given by 

R C R(di,d 2 ) = R* CR (d 1 ,d 2 ) . (30) 

Theorem [T] is best understood in the context of the joint RD function of X and Y. Specifically, 
R* CR (di,d 2 ) can be rewritten as 



R* CR (d 1 ,d 2 )= min I(X,Y;X,Y)-mm{l{X;X,Y), I{Y;X,Y)} 



(31) 



which can be interpreted as joint vector quantization coding followed by Slepian-Wolf coding. The encoder 
jointly maps (X, Y) to (X, Y). The common-reconstruction condition requires that X and Y satisfy the 
average distortion constraints d± and d 2 , respectively. The rate needed to simultaneously satisfy these 
constraints is captured by the I(X, Y; X, Y) term. The min{/(X; X, Y), I(Y; X, Y)} term captures the 
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fact that the rate I(X, Y;X,Y) can be reduced by exploiting the side-information at each receiver with 
a Slepian-Wolf code. 

Remark 1: This joint vector quantization and Slepian-Wolf coding structure implicitly allows the 
encoder to know X and Y with high probability. We can therefore impose a third common-reconstruction 
constraint at the transmitter without suffering a rate-loss. That is, the RD function with common recon- 
structions at the transmitter and both receivers is equal to RcR{di,d 2 ). This result is to be expected 
because the transmitter has X and Y from which it can always compute X and Y. What is less obvious, 
however, is that this result will also hold in the joint source-channel setting. Specifically, it will be optimal 
for the encoder to know X and Y with high probability. This result is not obvious because it is sometimes 
necessary to exploit randomness in the channel to efficiently induce distortions p4j . 

Theorem [T] gives a relatively straightforward single-letter characterisation of RcR(di,d 2 ). In contrast, 
giving a single-letter characterisation of R(d\,d 2 ) is much more difficult. A simple lower bound for 
R(di, d 2 ) stems from the following cut-set argument: R(di, d 2 ) must be at least as large as the smallest 
rate that is needed to compress X at the transmitter for decoding by receiver 1, while ignoring the 
distortion constraint on Y for receiver 2. The smallest such rate is given by the conditional RD function 
Rx\y{di)- More formally, we have the following. 

Proposition 2: For {d\,d 2 ) £ we nave that 

R(d 1 ,d 2 )>R L (d l ,d 2 ) , (32) 

where 

RL(di,d 2 ) = m&x{R x i Y (di),R Y \x(d2)} ■ (33) 

Surprisingly, Ri{d\,d 2 ) is the tightest lower bound in the literature. It equals R(d±,d 2 ) in the high- 
distortion regime where d\ = di max or d 2 = d 2jmax , but it is an open problem as to whether RL{d\,d 2 ) 
always equal^]i?((ii, d 2 ). The next example describes a simple binary source where d 2 ) is equal 



to R(d\,d 2 ). This example was also given in [ 15 ]. We review it here because it is relevant to the following 
discussion. 

Definition 11: The source qxy is said to be a Doubly Symmetric Binary Source (DSBS) with cross- 
over probability p if X = £ = & = # = {0, 1}, p G [0, 1/2] and 

QXY 0, y) = -(1- p)l x , y + - l x , y ) , (34) 



where 

A 



lx,y = < ' ^ X V (35) 

1, otherwise. 



Two upper bounds for R(di, d,2) have been given in 1 15 1 and 16 1 We discuss these bounds in Section 



(15J and jTeJ. 



IV 
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We can view qxY as resulting from the equation Y = X © Z. Here X is uniform on X , © denotes 
modulo-two addition, and Z is independent of X and takes values from {0, 1} with probability qz(0) = 
1 - p and q z (l) = p. 

Example 1: If qxy is the DSBS with cross-over probability p and 5\ and 5 2 are Hamming measures, 
then for all d G [0, 1] we have that |l2j 

Rx\Y{d) = RY\x{d) = < (36) 

[ 0, otherwise, 

where 

/i(A)^-Alog 2 A-(l-A)log 2 (l-A) (37) 

is the binary entropy function (take h(0) = h(l) = 0). Let ri min = minjcZi, d 2 }. Clearly, we have that 
R(di,d2) = RL,{d\,d2) = for d m m > p because each receiver can estimate its reconstruction directly 
from its side-information. For d mm < p, the transmitter computes Z = X © Y and sends a distorted 
version Z of Z to both receivers with an average (Hamming) distortion of d mm . This can be done with 



a binary RD code of rate Rz{dmm) = h(p) — h(d m i ); for example, see |11 Thm. 10.3.1]. Receiver 
1 decodes X by setting X.- L = Zi®Yi for i = 1,2, ... ,n. Similarly, receiver 2 decodes Y by setting 
Yi = Zi © Xj. It can be verified that both reconstructions, X and Y, achieve an average distortion d m m . 
The RD function is therefore given by 

D/j 7 \ / HP) ~ h (dmin), tfd miri <p 

R(d 1 ,d 2 ) = < (38) 
[ 0, otherwise. 

It is worth noting that the above code achieves an average distortion d m { n for both receivers; that is, it 
operates at the point R(d m i n , dmm ). Note also that this code does not satisfy Definition [2] (e.g., receiver 1 
cannot compute Y = Zi © Xj), so it cannot be used as a CR-RD code. The RD function is plotted for 
p = 0.25 in Figure [4] 

Consider the three functions: the CR-RD function Rcii(di, (fa), the RD function R(di,(fa) and the 
cut-set lower bound Ri,{di, cfe)- It is clear that 

R CR (d 1 ,d 2 ) > R(d 1 ,d 2 )>R L (di,d 2 ) , (d u d 2 )£R 2 + , (39) 

for all sources and distortion measures. Inequality (a) can be strict. For example, in Example [T] there is 
zero common information (in the Gacs-Korner [T7| sense) between X and Y when p > 0. This means 
that the receivers cannot agree on any non-trivial X and Y without additional information from the 
transmitter. Therefore, one would expect that Rcn(d,d) cannot reach until d = 0.5. In contrast, note 
that R(d, d) = for all d > p because each receiver can estimate its reconstruction directly from its 
side-information; see, for example, d\ = d 2 = 0.25 in Figure |4] 
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p = 0.25 



MlmTTTTm 




Fig. 4. Figure shows the RD function R(di, d-z) = h(p) — ft(dmin) for the doubly symmetric binary source (DSBS) with cross 
over probability p — 0.25 and Hamming distortions. This function is developed in Example [T] 



The next result shows that both (a) and (6) are equalities for vanishing Hamming distortions. The 



proof involves a minor modification of a result by Sgarro |18| (see also Wyner et al. HI Thm. 1]) and 
is omitted for brevity. 



(40) 



Proposition 3 (Sgarro p8]j): If Si and 62 are Hamming distortion measures, then 
Rcr(0, 0) = R(0, 0) = max {H(X\Y), H(Y\X)} . 



Our next result shows that inequalities (a) and (b) are in fact equalities for a non-trivial range of small 
distortions. A surface 9 in R% is said to be strictly positive if for all (d\,d 2 ) G ® we nave > and 



c?2 > 0; see, for example, Gray [19|. The next result is proved in Section [IV-E 



Theorem 2: If qxy has support f x f and b\ and 62 are Hamming distortion measures, then there 
exists a strictly positive surface & in M. 2 ^ such that 

RcR(di,d 2 ) = R(di,d 2 ) = RL(di,d 2 ) = m&x{R x \ Y (di), Ry\x(d2)} , (41) 
whenever {d\, d%) lies on or below <2i\ that is, there exists some (d' 1: df 2 ) G & with d\ < d^ and d2 < d' 2 . 

This result is not just interesting because R{di,d 2 ) and RcR{d\,d2) both meet the cut-set lower 
bound Ri(di,d 2 ) for small distortions. It also gives an explicit characterisation of R(d\,d 2 ) for a class 
of sources and distortions for which R(d\,d 2 ) would be otherwise unknown. 

We prove Theorem [2] by matching the cut-set lower bound RL,{d\,d 2 ) to the single-letter characteri- 
sation of the CR-RD function Rcn{di, d 2 ) given in Theorem [I] An important step in this proof requires 
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that Gray's extended Shannon lower bounds for joint, conditional and marginal RD functions [19| are 
tight. This tightness is only achieved in the small distortion regime^] 

The notion of "small distortions" is not vacuous; our next result shows that the set of distortions for 
which Theorem [2] holds for the DSBS is in fact quite large. Moreover, the boundary of this set has a 
close connection to common information (in Wyner's sense (231). Let W be a finite set of cardinality 



\W\ < 4 and let |23| 



K{X;Y)= min I(X,Y;W) , (42) 

P\V\XY&S^W\XY 

where &w\xy is the set of channels Pw\xy mapping 9£ x to W such that the resulting joint pmf 



for (X, Y, W) forms the Markov chain X -e- W -e- Y. The next result is proved in Section IV-F 



Theorem 3: If qxY is the DSBS with cross-over probability p £ [0,1/2], 5\ and 82 are Hamming 
distortion measures, and 



rf* = ^-^Vl-2p, (43) 



2 2 

then the CR-RD function RcFi(d,d) satisfies the following: 
(i) For all d E [0, d*] 



(ii) 



(iii) For all d £ (d*,l/2] 



Rcn(d, d) = R(d, d) = h{p) - h(d) ■ (44) 

R CR (d*, d*) = K(X; Y) - R x (d*) (45a) 

= K(X;Y) -R Y (d*) ; (45b) 

Rcn(d, d) / h(p) - h(d) , and (46a) 

Rcn(d, d) < h(d) - P -{l-p)h ( ™ ~ P J . (46b) 



In Figure |5] we plot R(d,d), d* , and the upper bound for Rcii(d,d) that is given in (46b 1. It can be 
seen from these plots that the threshold d* is reasonably large, and most interesting distortion pairs can 
be achieved by a CR-RD code. 

B. Main Results for Joint Source-Channel Coding 

Our next result characterises joint source-channel coding rates with common reconstructions. It is the 
joint source-channel coding extension of the Theorem [T] 

3 We note in passing that Shannon lower bounds are often used to prove small-distortion results; for example, see j^oJ-j^J. 
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p = 0.15 



p = 0.30 



p = 0.40 




(a) 



(b) 



(c) 



Fig. 5. The RD function R(di,d2) as well as an upper bound for the CR-RD function Rcii(di, (fe) are plotted for the DSBS 
with cross-over probability p. We consider three different values p. In Figure |5(a)| we have p — 0.15, in Figure [5(b)] we have 
p — 0.30 and in Figure 5(c) we have p = 0.40. The RD function R(d\, efo) is identified by a solid line, the upper bound for 



RcR.{di,d2) is identified by by a dotted line, and the threshold d* is identified by a vertical solid line. In all three plots we 
have set di = d-z = d. 



Theorem 4: A distortion pair (d\, d®) £ is achievable with common reconstructions and bandwidth 
expansion k if and only if there exists a pmf pyy on W and Pxy\xy E ^xyixy^Ii ^2) such that 

I(X; X, Y\Y) < kI(W; U) (47a) 
I(Y; X, Y\X) < kI(W; V) . (47b) 

As was the case for source coding, characterising joint source-channel coding rates without common- 
reconstructions (i.e. Definition [3} is difficult, and we have succeeded only in giving complete results 
for a few special cases. The next proposition reviews a special case that is known in the literature. This 
proposition follows from Tuncel [24[ Thm. 6], and it can be thought of as the joint source-channel coding 
extension of Sgarro's result (Proposition [3}. 

Proposition 4 (Tuncel [24]): Suppose 6± and 82 are Hamming distortion measures. Zero distortion is 
achievable with bandwidth expansion k if and only if there exists a pmf py/ on W such that 

H(X\Y) < k I(W; U) and (48a) 
H{Y\X) < k I(W; V) . (48b) 
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Tuncel's result is ideal because it characterises achievability simply and explicitly; it does not require 
auxiliary random variables and difficult optimization problems to be solved. The following consequences 
of this result are worth noting: (i) the physical separation of source and channel codes is suboptimaQ 
(ii) an optimal joint source-channel code exhibits a "partial" separation of source and channel coding 
at the transmitter, which results in the separation of source and channel random variables in ( [48] ); (Hi) 
an optimal joint source-channel code exploits randomness in the broadcast channel to perform a "virtual 
binning," which is analogous to the random binning used in the proof of Proposition [3j (iv) if the 
broadcast channel is such that the same p\y maximises I(W; U) and I(W; V), then all channels can be 
used to full capacity. This last property is not shared by broadcast channels in general. 

Like Sgarro's result for lossless source coding (Proposition [3]), Tuncel's result does not easily extend 
to more general distortion measures and distortions. This difficulty is evidenced by the growing body of 



work [26 1-| 30 1 concerning the lossy extension of [24|. Our next result gives necessary conditions for a 



distortion pair to be achievable. It is the joint source-channel coding extension of the cut-set lower bound 
Ri(di } d 2 ) for R(d\,d 2 ), see Proposition [2] A proof of this result is given in Section IV 

Theorem 5: If (di,d 2 ) € is achievable with bandwidth expansion k, then there exists a pmf pw 
on W such that 

Rx\y(di) < k I (W; U) and (49a) 
R Y]x (d 2 ) < k I(W;V) . (49b) 

In the Hamming distortion setting, we have that R X \ Y (0) = H(X\Y) and R Y \x(fy = H (Y\X). 
Therefore, Theorem [5] gives the necessary ("only if") condition of Proposition |4] Similarly, in the high- 
distortion regime d 2 = cfe.max we have that R Y \x(d 2 ) = and ( 49b| ) is satisfied by any p\y- We are 



left with ( |49a| ), which is the necessary condition of Shannon's joint source-channel coding theorem [31 
Thm. 9.2.2]. It is an open problem as to whether the conditions of Theorem [5] are both necessary and 
sufficient. The next result shows that these conditions are necessary and sufficient for small distortions. 

Theorem 6: Suppose qxy has support 3C x W and Si and 5 2 are Hamming distortion measures. There 
exists a strictly positive surface @ in such that every (di,d 2 ) on or below £F is achievable with 
bandwidth expansion k if and only if there exists a pmf pw on W such that (091) holds. 



The proof of Theorem [6] follows in a similar manner to the proof of Theorem [2j Specifically, we match 
the single-letter characterisation of Theorem |4] with the necessary conditions in Theorem [5] 

4 When considering separate source and channel codes, Tuncel \l\ assumed that the side-information present at each receiver 
is not used in the channel code. This assumption is appropriate in |24| because the side-information can be arbitrarily distributed. 
However, in Proposition [4] the side-information takes a particular "complimentary" form, and in some circumstances it may be 
appropriate to use this side-information in the channel code; for example, see [25|. 
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IV. Source Coding: Auxiliary Results & Proofs 
A. Approximating R(d\,d 2 ) 

We have already reviewed the cut- set lower bound 

R{d l ,d 2 )>R L {d 1 ,d 2 ) = ^{R x \ Y {d 1 ),R Y \ x {d 2 )} (50) 

in the introduction. We now review an upper bound for R(di,d 2 ) that, together with Ri{di,d 2 ), gives 
a good approximation of R(d\,d 2 ). 



Let 



R u {d 1 ,d 2 )^m S ^{R^{d l ),R^ z x {d 2 )} . (51) 



Su and El. Gamal |15] called this bound the compress-linear upper bound |T5| - the reason will become 



clear shortly. If 5\ and 5 2 are difference distortion measures, let 

C(d u d 2 ) 4 max {Cr (di), <3r((fc)} . (52) 

The next result bounds R(d\,d 2 ) from above and below, and it approximates R(di,d 2 ) when d\ and 
d 2 are difference distortion measures. 

Theorem 7: For (di,d 2 ) G M+, we have that 015 Thm. 2] 

R L (di, d 2 ) < R{d 1 ,d 2 ) < R v {di,d 2 ) . (53) 

If Si and 5 2 are difference distortion measures, then 

Ru(dx,d 2 ) - R L (d 1 ,d 2 ) < C{dx,d 2 ) . (54) 



The minimax capacity bound ( |54| ) shows that the gap between Ri(di,d 2 ) and Rjj(di,d 2 ) cannot be 
arbitrarily large (9j. The inequalities in ( |53] ) were obtained independently and contemporaneously by Su 
and El. Gamal in p~5fl . This proof of Theorem [7] is relevant to the following discussion, so it is worthwhile 



to give a brief outline. 

Proof: The fact that R(d±,d 2 ) > Ri{d\, d 2 ) follows from the cut-set argument given in the introduc- 
tion. To show R(d\,d 2 ) < Ru{d\,d 2 ) we combine two Wyner-Ziv codes with a simple linear-network 
code. At the transmitter, X is mapped to a binary vector using an optimal Wyner-Ziv code |l3j. This code 
treats Y as side-information at receiver 1, but it ignores Y at the transmitter. Similarly, Y is mapped 
to a binary vector using a Wyner-Ziv code that treats X as side-information at receiver 2, but it ignores 
X at the transmitter. The transmitter sends the modulo-two sum of these codewords (in the same way 
as Example [TJ over the noiseless BC, and each receiver recovers their desired codeword by eliminating 
(subtracting) the codeword destined for the other receiver. It is possible to perform this elimination 
because each receiver can calculate (from its side-information) the Wyner-Ziv codeword intended for the 
other receiver. Note, if conditional RD codes were used in place of Wyner-Ziv codes, then each receiver 
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cannot calculate the codeword intended for the other user and this elimination is not possible. The second 



result (54l follows directly from Zamir's work on rate-loss in the Wyner-Ziv problem [9]. Rjj(di, tfe) is 
called the compress-linear upper bound because it is obtained by combining two Wyner-Ziv compression 
codes with a linear-network code. ■ 

The gap between Rjj{d\,d2) and Ri,(d\ : d2) can be no larger than the "rate loss" of the Wyner-Ziv 
RD function over the conditional RD function. If qxy and b\ and 82 are such that there is no rate loss, 
then Theorem [7] characterises R(d\,d2). The following examples outline a number of such scenarios. 

Corollary 7.1 (Conditional Independence): If X = (X',U) and Y = (Y\U) where X' -e- U -e- Y' 
forms a Markov chain, then for all distortion pairs (d\, cfe) G we have that 

R(d 1 ,d 2 ) = R L (di,d2) (55) 

= R u (d 1 ,d 2 ) (56) 

= max {R x \u( d i)i R Y\u( d 2)} • (57) 
In particular, if X and Y are independent, then we have 

R(d ll d 2 ) = max {Rx(d 1 ),Ry{d 2 )} ■ (58) 

Proof: If X = (X', U) and Y = (Y' , U) where X'-e-U-e-Y' forms a Markov chain, then X&U&Y 
also forms a Markov chain. Moreover, we have 

R x \ Y (di) >R X \u(.di)^R^(di)>R^^i) > (59) 

where (a) follows from the Markov chain X -e-U -e-Y , (b) follows becauseJ^X = (X', U), and (c) follows 
because Y = (Y',U). On combining ((59]) with the fact that R^^(di) > Rx\y(di), it follows that 
R^y(di) = Rx\y(di)- A similar argument yields R^^ x (d2) = Ry\x(d2)- Substituting these equalities 
into the definitions of Ri(di,d2) and Rjj{di,d2), and applying Theorem [7] completes the proof. ■ 

Corollary 7.2 (Two Deterministic Reconstructions): If % and V are finite sets, ip x : X — > % and 

ip y : 9 -»• V are mappings, U = tp x (X), V = ip y (Y), £ = <% , # = f, 

0, if u = ib x {x) 
* V ' (60) 

1 , otherwise, 

0, if v = ipy(v) 
^y\yj (61) 

1, otherwise, 
then we have that 

12(0,0) = max {H(U\Y),H(V\X)} . (62) 
The side-information U is a component of the source; therefore, (di) and R x 

\u(di) are equal. 
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Proof: The conditional RD function R x \Y(di) and the Wyner-Ziv RD function R}^^(d\) are both 
continuou^] at d\ = 0. We have that 

R X{Y (0) =min I(X;U\Y) , (63) 
where the minimum is taken over all channels V{j\xy w i tn 

^2 Pt/|xy(^l x 'y)9xy(2;,y)^i(^3) = . (64) 

(u,x,y)& a frx%'x?y 

Suppose that V(j\xy achieves the above minimum. Since 5i(x, u) = when ip x (x) = u and 5\(x, u) = 1 
when tp x (x) / u, ((64]) implies that when qxY(x,y) > we have that V(j\xy must sat i s fy 

1, if u = if) x (x) 

P(r \ XY {u\x,y) = { (65) 
[0, otherwise. 

That is, U = U almost surely. Therefore H(U\X, Y) = and R x \y(0) = H(U\Y) = H{U\Y). We also 
have that 

R%Udi) = mm I(X;A\Y) , (66) 

where the minimization is taken over all choices of an auxiliary random variable A with a joint pmf 
Paxy satisfying the Markov chain A -e- X -e- Y and the distortion constraint 

y~] PAxrja, x, y)5i(x, u) = , (67) 

a,x,y 

where u = vri(a,y). Setting A = U = ip x {X) gives fl^jf (0) < H{U\Y) and therefore R\^ (0) = 
i? X | K (0) = #(J7|y). A similar argument gives R Y \x(0) = Ry\X (°) = The P roof is completed 

by applying Theorem [7] ■ 
Using standard techniques, Theorem [JJ can be extended from discrete finite alphabets to real-valued 
alphabets [33]. This extension yields the following example for jointly Gaussian sources. 



Example 2 (Jointly Gaussian): If X = M+, ^ = M+ and 
<?xy 0, y) = 1 1 = exp 



2^(1-^) 



g - ra^A 2 + ( y-m y \ 2 _ (a; - m x )(y - m y ) 



(68) 



32 



The continuity of the 



6 The Wyner-Ziv rate-distortion function was shown to be continuous at d = by Willems in 
conditional rate distortion function i?x|y(di) at di = follows from Willems result because Rx\v(di) is a special case of 
the Wyner-Ziv rate distortion function when the source and distortion measure are chosen appropriately. 
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and 

5\ (x, x) = (x — x) 2 (69a) 

&2{y,y) = (y-yf , (6%) 

then for all distortion pairs (di,d 2 ) £ R+ we have 

R(di,d 2 ) = max{R x \Y(di), Ry\x(d 2 )} , (70) 

where |[T3j 

yWZ , 



Rx\v(di) = Rx\r(di) » ( 71a ) 



i? m (d 2 ) = <f(d 2 ) , (71b) 



and 



[ 0, otherwise. 

^y|x(d 2 ) = < 2 d2 " yV ^ (72b) 

[ 0, otherwise. 

Remark 2: Corollaries 7.1 and 7.2 include the results of [15, Sec. III.B] as special cases. Example [2] 
was independently given in |l5j. 

The next result characterises R(di,d 2 ) for one large distortion and shows that the upper bound 
Ru{d\,d2) can be loose. Its proof follows directly from the lower bound RL(dx,d 2 ) in Theorem [7] 
and the coding theorem for the conditional RD function [12]. This proof is omitted. 

Corollary 7.3: For (d±,d 2 ) G we have that 

R(di,d 2 , max ) = Rx\y(di) and (73a) 
R(d 1)m ax,d 2 ) = R Y \x{d2) ■ (73b) 

In summary, the compress-linear upper bound Rjj(d\,d 2 ) and the cut-set lower bound RL{d\,d 2 ) 
well approximate R(d\,d 2 ) when 5\ and 5 2 are difference distortion measures. Specifically, the ideas of 
Zamir [j9j can be used to show that the gap between Ri{di,d 2 ) and Ru{d\,d 2 ) is no larger than the 
maximum of two minimax capacities. The bounds yield an exact characterisation of R(d%, d 2 ) for sources 
with zero rate-loss in the Wyner-Ziv problem 0, p3| ; however, it is well known that this condition is 
very restrictive [13 Remark 5]. Two sources that satisfy this condition are the jointly Gaussian source with 
a squared-error distortion measure (see [13, Remark 6] and Example [2]) and the erasure side-information 
source with a Hamming distortion measure p4[ , |35|. Corollary 7.3 and Example [T] demonstrated that 
the compress-linear upper bound Ru{d\,d 2 ) can be loose. We conjecture that RL(d\,d 2 ) is also loose 
in general, but no counterexample has been found to date. We give a different lower bound for R{d\,d 2 ) 
in Appendix |A| 
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B. Kimura-Uyematsu and Heegard-Berger Upper Bounds for R(d\, d 2 ) 

In this section, we review an upper bound for R(d±,d 2 ) that was proposed by Kimura and Uyematsu 



in 1 16 Thm. 1], and we compare this bound to the compress-linear upper bound Ru(d\,d 2 ). We then 



formulate a new upper bound for R(d\,d 2 ) using a result of Heegard and Berger [36 1, (37). The main 



purpose of this section is to unify the achievability results of |T5J, (16] |, (36J, (37) . 
Let ^ be a finite set of cardinality 

\^\ < \3C\ \&\ + 2 . (74) 

Let ^c\XY(di, d 2 ) denote the set channels Pc\xy randomly mapping l"x^ to such that there exist 
functions m : If x 9 -»■ £ and vr 2 : ^ x SC -> # with 

X^Pc|xy(c|2;,y)gxy(2;,y)5i(x,7ri(c,y)) < di and (75a) 
^^c|xy(c|2;,y)gxy(2;,y)'52(y,vr2(c, »)) < d 2 . (75b) 



x,y,c 



Define 



R* u (d 1 ,d 2 )= min max{l(X;C|Y),I(Y;<7|X)} . (76) 



Lemma 1 (Thm. 1, I[l6\l): For (di,d 2 ) £ we nave that 

R(d 1 ,d 2 )<R* u (di,d 2 ) . (77) 

Lemma [T] is called the one-description upper bound because its proof follows from a random coding 
argument that describes both X and Y with one description. 

The one-description bound Ry(di, d 2 ) and the compress-linear bound Ru(d\, d 2 ) both involve difficult 
minimizations, so it is not immediately clear when one bound outperforms the other. The next result 
resolves this question and shows that Ry{d\,d 2 ) is always better than Rjj(d\,d 2 ). 

Lemma 2: For (di,d 2 ) G M+, we have that 

R(d u da) < R*u(di,d 2 ) < R v (di,d 2 ) . (78) 

Proof: We have that 

i? c/ (di,d 2 )=max{ J R^f-(d 1 ,d 2 ), ^f(di,d 2 )} (79) 
= maxi min I(X:A|Y), min I(Y;B\X)\, (80) 

where the auxiliary random variables A and i? satisfy the Markov chains A -e- X -e- Y and S -e- Y -e- X. 
Note that A and 1? do not appear together in any of the mutual information or distortion conditions, so 
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we can combine these minima into a minimum where A e- (X, Y) -e- B forms a Markov chain. To this 
end, let ^\ B \ XY (di, d 2 ) denote the set of channels Pab\xy mapping x & to x SB such that the 
following properties hold: 

1) The joint distribution, PAB\XY( a i b\ x i v) ( 1xy{x, y), factors to form the long Markov chain A-e-X -e- 
Y&B. 

2) There exist functions w x : srf x & ->■ X and ir y : 38 x JT -4 # such that 

XI PAB|xy(«,b|x,y)gxy(a;,y)5i(2;,7r :!; (a,y)) < di, (81a) 
(a,b,x,y) 

X PAB|xy(a,b|x,y)gxy(z,y)52(y,%(M)) < d 2 . (81b) 
{a,b,x,y) 

Note that the long Markov chain A -e- X -e- Y -e- B in condition 1 is implied by the Markov chains 
A -e- (X, Y) -e- B, A-e- X -&Y and B -&Y ■& X. We now have that 



Ru(di,d 2 ) 



mm 



max {I '{X;A\Y), I(Y;B\X)} . 



(82) 



PAB\XY&3t % AB \ XY {d l 4z) 

The constraint A -e- X -e- Y -e- i3 implies (A, X) -&Y -e- B which, in turn, implies X-e- (A, Y)-&B. Therefore, 
we have 



Similarly, we have 



Combining ((82} with ([85]> and 

Ru(di,d 2 ) = 



I(X; A\Y) = H(X\Y) - H(X\A, Y) 

= H(X\Y)-H(X\A,B,Y) 
= I(X;A,B\Y) . 

I(Y;B\X) = I(Y;A,B\X) . 
([86]) completes the proof 

min max \l(X; A, B\Y), I(Y;A,B\X)\ 

PABXYE&>Hd U d2) 



> min max {I (X;C\Y), I(Y;C\X)} , 



(83) 
(84) 
(85) 

(86) 

(87) 
(88) 



where ( [88] ) follows because ^c\XY(di, d 2 ) ~2 ■^ > AB\XY^di^d 2 ). ■ 

The results of Heegard and Berger [36, Thm. 2] (see also [37] ]) can be modified to further strengthen 
the one-description upper bound. Let 3^c\xy denote the set of all channels Pc\xy mapping 3£ x & 
to "if. For (di,d 2 ) G R 2 + , define 



R*u(d u d 2 ) 



mm 



max {I(X; C\Y),I(Y; C\X)} + R x \ CY {di) + R Y \cx(d 2 ) 



(89) 



where Rx\CY(di) and Ry\cx(d 2 ) are the conditional RD functions of X given (C, Y) and Y given 
(C, X), respectively. The proof of the next result follows directly from [36 Thm. 2] and Lemma [2] and 
is omitted. 
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Theorem 8: For (d\, d 2 ) £ M+, we have that 

R(d u <h) < Rff{d u d 2 ) < R*u(di,d 2 ) < R u (d 1 ,d 2 ) . 

In summary, the compress-linear upper bound Rjj{d\, d 2 ) and the cut-set lower bound Rx J {d\,d 2 ) well 
approximate R{d\, d 2 ) for difference distortion measures. The compress-linear bound is weaker than the 
one-description bound, i.e. Rjj{d\,d 2 ) > Ry{d\,d 2 ), and this inequality is strict for the DSBS with 
Hamming distortion measures (Example [TJ. Finally, the one-description bound is potentially weaker than 
Heegard and Berger's bound, i.e. R^(di,d 2 ) > Rff{d\,d 2 )\ however, we have not found an example 
where this inequality is strict. 

C. Proof of Theorem [7] 

In Theorem [I] we claimed that the CR-RD function Rcn(di,d 2 ) is equal to R* CR (di,d 2 ). We now 
prove this result. 

Proof: The coding theorem is a special case of the one-description bound, where C is chosen to be 
(X,Y). We omit the proof. It remains to prove the converse theorem. If r is (di, (^-admissible, then 
by definition there exists the following: 

1) a monotonically decreasing sequence {e^} with lim^oo e« = 0, and a monotonically increasing 
sequence {raj}; 

2) a sequence of common reconstruction RD codes {(J*™ 1 -*, , g 2 \4>i ,02^)}' wnere K ^ n ^ < 
r+ti, A ( ; h) < di+ei, < d 2 +e t , (M,X) ? g[ ni \M,Y)} < e h and Pr[^ ni) (M, Y) ^ 
g^\M,X)]<ei. 

We now show that r + a > R* cr {d\ + a, d 2 + ei) — e(rii, e«) for all i, where lim^oo e(rn, ej) = 0. To 
this end, the following inequalities will be useful: 

e(n u e i )>H(g ( 2 ni \M,X)\g { ^\M,Y),^'\M,Y),Y) and (90a) 
e(ni,ei) > H [g^ (M, Y) | g^ (M, X) , (M, X) , X) , (90b) 



where 



e(m,ei)^-^- + €ilog 2 \SirxSr\. (91) 
m 



This inequality is a consequence of Fano's inequality 1 11 1, the common-reconstruction property 

Pr[4 n<) (M,Y) ^ 5 < ni) (M,X)] < ei and (92a) 
Pr[4 ni) (M, X) ^ 5 < ni) (M, Y)] < a , (92b) 
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and the fact that the cardinality of the range of <pi, i = 1,2, can be no more than \St x ^V\ n \ Note that 
linij^oo e(rii, ej) = 0. By definition, we also have 

r + ei > = \\og 2 \JZ^\ 
1 



n 



> — H(M) 

m 

> —H(M\Y) 

rii 

= -H(M, Y, g^\M, Y),^\M, Y) I Y) 

Tli 

>-H{g^\M,Y)A { r\M,Y)\Y) 

rii 

' H (g^ (M, Y) , ^\M, Y) , (M, X) | Y) 

- H ( g W (M, X) 1 5 < ni) (M, Y), 0< n<) (M, Y) , Y) 



> -H(g^\M,Y),^''(M,Y),g^\M,X)\Y) - e(n t ,e 



("0, 



n 



> -^( 5 i ni) (M,Y),^ n ' J (M,X)|Y) -e(ni,ei) 



> -7(X; 5 J ni) (M,Y), 5 f i; (M,X)|Y) 



(«*), 



n 



= - £ /(*,■; ^ (M, Y), ^(M, X) | Y, X{- X ) - e(m, a) 
Ui j=i 

= ^y2 I ( X ^9t\M,Y),g^\M,X),xi-\Yt\YP +1 \Y i ) - sin^ei 
rii . , 

, n. 

> - J]/(X i ; 5 i ni) (M,Y), 5 ^ ) (M,X)|y i ) -e^ei) , 



(93) 
(94) 
(95) 
(96) 
(97) 

(98) 
(99) 
(100) 
(101) 

(102) 
(103) 
(104) 



where ([93]) through (|98J) follow from standard identities, (|99]> follows from (|90j), ( |100| ) through ( |1021 ) 



follow from standard identities, and ( |103| ) follows because the source is iid. 
A similar procedure yields 



+ -X^( y ^i \M,Y),gp ) (M,X)\X j ) -e(rH,€i) . 



(105) 



Let Xj and denote the j th elements of g^^^M, Y) and g^\M, X), respectively. I.e. Xj and i^- are 
the j th symbols reconstructed by the receivers. Expanding the conditions Ai < d\ + ej and A2 < c?2 + e. 
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gives 



E 



E 



1 TH 



I ■ , 
J = l 



1 



< C?2 + £j . 



(106a) 



(106b) 



Recall, {(Xj, Yj)} is drawn i.i.d. according to qxy(x, y). For each j, let Px-y\x y (%j,yj\ x j, Vj) denote 
the conditional probability of (Xj,Yj) given (Xj,Yj); that is, combining Px-Y \x-Y ^h Vj\ x j) Vj) w ^ tn 
qxy(xj,yj) characterises the joint pmf of (Xj,Yj, Xj, Yj). Define the "time-shared" channel 

1 n 

From ( 106a I and ( 106b| ), we have 

E Pxy\xy(%> yl x > y)<lXY(x, y)Si(x, x) < di + 



x,y,x,y 



E PxY\XY^^\ x ^y) ( lxY{x,y)52{y,y) < d 2 + e { ; 



(108) 
(109) 



x,y,x,y 



consequently, p XY ^ XY (x,y\x,y) G ^^y^y^i + ei,d 2 + ej). We have that 



n: 



^(Xj-X^Y^) >I(X;X,Y\Y) , and 



3=1 



1 n* 



(110) 

(111) 



3=1 



where we have used Jensen's inequality together with the convexity of I(X; X, Y\Y) and I(Y; X, Y\X) 
in p xy ^ XY when the joint pmf of (X,Y) (here qxy) is fixed (see Lemma |5j below). Finally, combin- 
ing ( |1 10[ ) and ( |111| ) with the definition of R* CR (d\, d 2 ) we have 



+ €i > max {l(X;X,Y\Y), I(Y; X, Y\X)} - e(m, e< 
> R*cii(di + €i, d 2 + e») - e(nj, e») , 



(112) 
(113) 



which is the desired result. 

The converse is completed by noting that lim^oo = 0, linij-^oo e(nj, ej) = 0, and R* CR {d\,d 2 ) is a 
continuous function of di and d 2 . ■ 

Lemma 3: Suppose the random vector (A,B,C) on s>/ x ^ x 'if is characterised by the joint pmf 
PABc(o-,b,c) = pQiA B (c\a,b)pAB(a,b). The condition mutual information I(A;C\B) is convex in 
Pc\AB(c\a, b) for fixed pab(«, 
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Proof: Fix pab- From the convexity of mutual information 1 1 1 Thm. 2.7.4], we have that I (A; C\B = 
b) is convex in Pc\Ab{'\' ify f° r eacn b. The lemma follows by noting that I(A;C\B) is a convex 
combination of I (A; C\B = b). Further details can be found in Appendix [b| ■ 

D. Extreme Distortions 

The next result shows that if one source is required to be reconstructed with vanishing Hamming 
distortion, then the RD function R(d\,d2) and the CR-RD function RcR(di,d 2 ) both collapse to the 
cut-set lower bound R^di, d 2 ). 

Corollary 8.1: If 8\ is a Hamming distortion measure, then for all d 2 6 K+ we have that 

R(0,d 2 ) = R CR (0,d 2 ) = m^{H(X\Y),R Ylx (d 2 )} . (114) 

Proof: From Proposition [T] and Theorems [T] and [7] we have that 

R L (0,d 2 ) < R{0,d 2 ) < R CR {0,d 2 ) = min max{l{X;X,Y\Y),I(Y;X,Y\X)} . 

(115) 

Let Py\xY b e a channel that achieves the minimum for the conditional RD function R Y \x{d 2 ). This 
channel and qxy together define a joint pmf for (X, Y, Y). In addition, set X = X to obtain a joint pmf 
for (X, Y,X,Y). This joint pmf belongs to the set ^±y\xy^i ^ 2 )" Note, we have the Markov chain 



(Y,Y) -e- X -& X and therefore the chain Y e- (X, Y) e- X. On substituting this joint pmf into ( 1 15 1 we 
obtain the following upper bound for Rcr{0, d 2 ): 

Rcn(0,d 2 ) < max{l(X;X,Y\Y),I(Y;X,Y\X)} (116) 

= nmx{H(X\Y) - H(X\Y,X,Y),I(Y;Y\X) + I(Y;X\X,Y)} (117) 
= max{#(X|y),/(y ; y|X)} (118) 

= max{if(X|y), J R m (d 2 )} , (119) 

where < \l 18| ) follows because X = X and 1" e- (X, forms a Markov chain, and ( |119| ) follows 

because Py| X y was chosen to achieve the minimum in the definition of Ry\x{d2)- The proof is completed 
by noting that 

R L (0,d 2 ) ±max{R x]Y (0),Ry\ x (d 2 )} (120) 
= m a x{H(X\Y),R Ylx (d 2 )} . (121) 

■ 

The next result covers the one large distortion setting. The proof follows directly from Theorem [T] and 



is omitted. Note that it may differ from Corollary 7.3 - the large distortion result for R(d±,d 2 ). 
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Corollary 8.2: For d\ € M+ we have that 

RcR{di,d 2 ,max) = o _ min max {l(X; X\Y),I(Y; X\X)} , (122) 

where ^ x \xy denotes the set of all test channels V±\xy ma PPi n g x to X such that 

^2 Px\xy ( £ l x ' y)lXY(x, y)5 1 (x, x) <di . (123) 

x,y,x 

E. Small Distortions and a Proof of Theorem [2] 

The following result gives a useful upper bound for Rcn(d\, d 2 ). We will use this bound to prove the 
small distortion result Theorem [2 

Corollary 8.3: For (d±,d 2 ) £ ^+ we have that 

R CR (di,d 2 ) < max {R XY {d 1 ,d 2 ) - R x {d 1 ),R X Y(di,d 2 ) - R Y (d 2 )} ■ (124) 

Proof: Let P^yi^y achieve the minimum for the joint rate distortion function Rx Y {d\, d 2 ). Then, 

R CR (di,d 2 ) < Rxriduck) -mm{l(X;X,Y), I(Y;X,Y)} , (125) 

where the remaining mutual information terms are evaluated using p X y\XY ■ qxy- Note that 

I(X; X, Y) > I{X; X) > R x (d 1 ) , (126) 

where the last inequality follows from the definition of Rx{d\). Similarly, we also have that I(Y; X, Y) > 
RY(d 2 ), and thus 

Rcn(di,d 2 ) < R XY (di,d 2 ) -wmfafa), R y (d 2 )} . (127) 

■ 

On combining this result with Proposition [T] and Theorem [7] we have 

max{R XY (di i d 2 )-Rx(di),RxY(di,ck)-RY(d^} >R CR (di,d 2 ) (128) 

>R(di,d 2 ) (129) 

> max {R x \y {di), R Y \x (d2)} ■ (130) 

From this chain of inequalities, it is clear that if 

Rx\Y(di) = RxY(di,d 2 ) - Rv(d 2 ) and (131a) 

R Y \ x {d 2 ) = R XY (di, d 2 ) - Rx(di) , (131b) 

then we have that the RD function R{d\, d 2 ) and the CR-RD function Rcn(di, d 2 ) both meet the cut-set 
lower bound RL(d\,d 2 ). The next two examples give situations where ( |131[ ) holds. 
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Example 3: If X and Y are independent {qxY = Qx • <7y)> then 

max{i? X |y(d 2 ), i?y|x(d 2 )} = max{i?x(di), J Ry(d 2 )} , (132) 

and 

Rx Y {d 1 ,d 2 ) - min {Rxidt), R Y {d 2 )} (133) 
= flx(di) + Ry(d 2 ) ~ min {^(di), #y (d 2 )} (134) 
= m&x{Rx(d 1 ),R Y (d2)} ; (135) 

therefore, 

fl(di,da) = R C R(di,d 2 ) = max{i? x (di), fly(d 2 )} . (136) 

Example 4: If <5i and J 2 are Hamming measures, then 

max{ J R X |y(0), fl y|x (0)} = max{F(X|Y), H(Y\X)} (137) 

and 

Rx Y (di,d 2 ) - min {i?x(d x ), i?y(d 2 )} (138) 
= H{X,Y)-vain{H{X),H{Y)} (139) 
= max{#(X|Y),#(Y~|X)} ; (140) 

therefore, 

ij(0, 0) = i?cfl(0, 0) = max{# (X| Y~), fT(y|X)} . (141) 



This idea of matching the lower and upper bounds in ( 130) is not just useful for these simple examples 



Our main result, Theorem [2j showed that it is also useful for sources with Hamming distortions with 



small distortions. The proof of this result is a simple consequence of Corollary 8.3 



Proof of Theorem^ Let us recall Gray's results for the extended Shannon lower bounds of joint, 



conditional and marginal RD functions. Specifically, from 1 19 Thm. 3.2 & Cor. 3.2] there exists a strictly 
positive surface S 1 in such that 

Rxr(di, d 2 ) = R X \ Y (di) + R Y (da) , and (142a) 
RxY(di, da) = R Y \ x {d 2 ) + Rx{di) (142b) 



for all (di, d 2 ) G that lies on or below 3>. Combining this result with ( 130 1 proves the theorem 
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Fig. 6. Doubly Symmetric Binary Source (DSBS) with cross over probability p. 



F. Proof of Theorem [5] 

The joint pmf qxy of the DSBS can be thought of as resulting from using X as a uniform input to a 
binary symmetric channel (BSC) with cross over probability p, see Figure [6] By symmetry, we can also 
think of qxy resulting from using Y as a uniform input to a BSC with cross over probability p. 

1 ) Proof of ( |44] ).- In Example [T] it was shown that the RD function without common reconstructions 
R(d\, d?) equals the cut-set lower bound Ri,(d\,d2)- Since Rcn{d, d) > R(d, d) for all d E [0, 1/2], we 
have that 

RcMW>R(^) = i h{p, - hid) - f °' A ~ P < 143 > 

10, for d > p . 



Let 



d* = l -- l -^2p (144) 



and note that d* < p. For any d £ [0, d*], we now construct a test channel P X y\xy tnat b e l° n g s to 

&>^ lXY (d,d) and I(X;X,Y\Y) = I(Y;X,Y\X) = h(p) - h(d). 
Fix d G [0,d*], and let 

/?=— • (145) 
y I -2d 

Note that d-kfi = d* , where d-k ft = d(l - /?) + (1 - d)/3 is the binary convolution. Let W = {0, 1}. We 
now define a joint pmf p(x, x, w, y, y) on x 3C x ~W x ^ x ^ by assuming a uniform input to the 
cascade of the four BSCs shown in Figure [7] Specifically, we set 

p(x, x, w, y, y) = p(x)p(x\x)p(w\x)p{y\w)p(y\y) , (146) 
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<iY\x{y\x) _ Pw\x( w \ £ ) PY\w(y\ w ) p Y \Y(y\y) 



qx(o) = 
9x(l) = 




Fig. 7. DSBS test channel configuration for d < d* . 



where p(x) = 1/2 for x = and x = 1 and 

p(s|x) = (1 - + d(l - U, x ) (147a) 

pHx) = (1 - /3)l tt)4 + /3(1 - (147b) 

p(y|«0 = (1 - + /3(1 - (147c) 

p(y|y) = (1 - d)l y ,y + /3(1 - 1^) . (147d) 

Note that since is uniform we may equivalently view p(x,x,w,y,y) as resulting from using p(y) 
as a uniform input to the (reverse) cascade of four BSCs shown in Figure [7] 

By construction, the expected distortions E[<5i(X, X)] and E[<$2(Y, Y)] for this joint pmf are both equal 
to d. Moreover, since d* (3 = d* and d* * d* = p we have that 

^2 p( x , x, w, y, y) = qxy{x, y) , (148) 

x,w,y 

and the joint pmf p(x, x, w, y, y) defines a valid channel in ^ X Y\XY c 0" Combining this channel with 
Theorem [T] yields 

R C R(d,d)<max{l(X;X,Y\Y),I(Y;X,Y\X)} (149) 

= max{H(X\Y) - H(X\Y, X ,Y), H(Y\X) - H(Y\X,X,Y)} (150) 

= max {H(X\Y) - H(X\X), H(Y\X) - H(Y\Y)} (151) 

= h(p) - h(d) , (152) 
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where ( |151| ) follows because X -e X -e- (Y,Y) and Y -e-Y ■& (X, X) form a Markov chains, and ( 152 1 
follows by construction. 

2) Proof of $5\: At d 1 = d 2 = d*, we have that 



R(d*,d*) = R CR (d*,<f) = h{p) - h(d*) . (153) 
The marginal RD functions of X and Y are given by 

R x {d*) = 1 - h(d*) and (154a) 
R Y (d*) = 1 - /i(cf ) . (154b) 
Wyner showed that the common information of X and Y is given by (23} Eqn. 1.19] 

K(X;Y) = l + h(p)-2h(d*) . (155) 
Therefore, R CR {d*,d*) = K(X;Y) - R x (d*). 

Remark 3: The W that achieves the minimum for K(X;Y) is the same as the W in Figure [7] 
Specifically, for d\ = d 2 = d* we use X = Y = W. 

3) Proof of ( |46a] ).- Suppose that d x = d 2 = d. If d = 0, then it is clear that i?(0,0) = R C r(0,0) = 
H(X\Y) = H(Y\X) = h(p). Moreover, it is optimal to choose 

| 1, if x = x and y = y 

PXY\XY^^\ X ^) = \ n , , ( 156 ) 

[0, if x f: x or y ^ y . 

With this choice of test channel, we have that X -e- (X, Y) -e- Y forms a Markov chain. The next lemma 
shows that this chain is necessary for Rcn{d, d) = R(d, d). 



Lemma 4: If Rcn{d,d) = R(d,d), then the minimum 

min 

hannel P X y ]XY for 
the Markov chain Ie(I,y)eY and H(X\X, Y) = H(Y\X, Y) = h{d) 



mm max tl(X;X,Y\Y), I{Y;X, Y\X)\ (157) 

is achieved by a test-channel P* X ^ XY f° r which the resultant joint pmf for (X, Y, X, Y) factors to form 



Proof: Suppose p*- y achieves the minimum in ( 157 1. From the definition of conditional rate- 



distortion function, R x \ Y (d), the following is apparent 

I(X; X, Y\Y) > I(X; X\Y) > R x \ Y {d) = h(p) - h(d) . (158) 

Similarly, 

I(Y; X, Y\X) > I(Y; Y\X) > R Y \ x {d) = h(p) - h(d) . (159) 
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If max{I(X;X,Y\Y), I(Y;X,Y\X)} = h(p) - h(d) then from flI5g) and ( p39] > and F(X|Y) 
H(Y\X) = h{p), we have 



Then, we further have 



H(X\X,Y,Y) = H(Y\X,Y,X) = h(d) . 

I(X; Y\X, Y) = H(X\X, Y) - H{X\X, Y, Y) 
= H(X\X,Y) - h(d) 
= H{X®X\X,Y) - h{d) 

< H(X®X) - h(d) 

< . 



(160) 

(161) 
(162) 
(163) 
(164) 
(165) 



The non-negativity of conditional mutual information gives I(X; Y\X, Y) = and therefore X-e-(X, Y)-& 
Y. The proof is completed by combining this chain with fll60| ) to get H(X\X, Y) = H(Y\X, Y) = h(d). 



The proof of ( |46a[ ) will follow via a contradiction. Suppose there exists d > d* such that Rcn(d, d) 
h(p) — h(d). From Theorem [3] we have that 



RcR(d, d) = min 



I{X, Y; X, Y) - min {l(X; X, Y),I(Y; X, Y)} 



(166) 



Let Pxy\xy ^ e tne test channel that achieves the indicated minimum, and consider the term I(X, Y; X, Y) 
in ( |166| ). From Lemma |4j the joint pmf induced by p^^ xy and qxy factors to form the Markov chain 
X e- (X, Y) e Y; therefore, I(X, Y; X, Y) can be lower bounded by Wyner's common information [23 
Sec. 3] via 

I(X,Y;X,Y) > l + h(p) -2h(d*) . (167) 

We have H(Y) = 1, and from LemmaQwe have H(X\X,Y) = H(Y\X,Y) = h(d). Since h(p)-h(d) 
is strictly decreasing on [0, p) it follows that Rcn{d,d) < Rcn(d* , d*), which is equivalent to 

h(p) - h(d*) > I(X, Y; X, Y) - I(Y; X, Y) , (168) 

and by the above discussion 

h(p) - h(d) > 1 + h(p) - 2h(d*) - [1 - h(d)] , (169) 
which implies h(d*) > h(d), which is a contradiction since h(-) is strictly increasing on [0,1/2]. 
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W\XY 



(w\xy) 



X x <3f 



q XY (0,0) = -(1 - p) 



axy(0,l) = -p 



?xy(l,0) = -p 



gxr(l, 1) = ^(1 " P) 11 — ^T- a 




1 9^(1) 



Fig. 8. Depiction of the channel p-^^ XY (w\x,y). The transitions represented by dotted lines each have probability 

PW\xy(M x iV) = V 2 - 



4) Proof of ( |46b| ).- We choose a channel P^y^y tnat achieves the bound given in ( |46b| ). Let 5^ = 
{0, 1}, and define 



where 



* 2d -P mix 
fl= 2(wj' (171) 



and 1 Z)J/ and l^,* are indicator functions (equal one if the subscripts are equal and zero otherwise). 
The channel P^\ XY (w\ x t V) i s depicted in Figure [8| 



Set X = W and Y = W. Note that 

Pw\x(™\ x ) - Pw\XY(™\ x >y)<lY\x(y\x) and (172a) 

yd» 

Pw\Y^\y) ~ Pw\XY(^\ x 'y)^x\Y(x\y) , (172b) 

are both BSCs with a crossover probability d. Therefore, E[5i(X, X)] = d and E[<52(Y, Y)] = d. Finally, 
the rate of the channel is given by 

I(X;W\Y) = H(W\Y) - H(W\X,Y) (173) 

= h(d)-[p+(l-p)h(a)] . (174) 

By symmetry, we also have I(X; W\Y) = h(d) — p — (1 — p)h(a), which completes the proof. 



Remark 4: The channel p^^ XY (w\x,y) can be view as the natural continuation of the channel (146 1, 
which was used to prove ( |44| ). Specifically, p^, XY {w\x, y) is formed by passing W through a BSC with 



November 23, 2010 



DRAFT 



33 



crossover probability (d — d*)/{\ — 2d*). This latter quantity is chosen because 

d = d* * . (175) 

1 - 2d* 

V. Joint Source-Channel Coding: Auxiliary Results & Proofs 



We now extend the source coding results of Section [IV] to the joint source-channel coding setting 
(Definitions [3] and |4]). We begin by proving Theorem [5] 

A. Proof of Theorem [5] 

Proof: If (cZijCfe) £ is admissible with bandwidth expansion factor k, then by definition there 
exists for every e > a joint source-channel code (f^\g[ ,g 2 ) with A^ 1 "^ < <ij + e, i = 1, 2. 

Let W = Wi , W 2 , ■ ■ • , W Kc t denote the codeword that is produced by the encoder. Let denote the 
marginal pmf for the i th symbol Wi. Define a new "time-shared" random variable W on W with pmf 

Pw( w ) ~ — + ^PwA w ) ■ ( 176 ) 

Kcl i=l 



Since I(W; U) is a concave function for fixed Qjj^, we have from Jensen's inequality 

I(W;U)>— J2l(Wi;Ui) . (177) 



i=l 

We further have 

J(W;U) =H(U) -iT(U|W) (178) 



= ^ [H(Ui\U h U 2 ,..., Ui-!) - H(Ui\W, U h U 2 ,..., Ui-x) 

8=1 

< [Hipi) - H(Ui\Wi) 



(179) 
(180) 



i=i 



J2HWi;Ui), (181) 



i=l 



where {180} follows because [/j f ; e (Wi, W 2 , Wi-i, W i+ i, W i+2 , W n , U\, U 2 , . . . , U,, 



i-l, 



forms a Markov chain. 
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Then we have 



k c I{W;U) >-J2l(Wi;Ui) 



i=l 



> -I(W;U) 

> ^J(X,Y;U) 

> i/(X;U|Y) 

-, K s t 



= -Y J HX l ;U\Xl 1 ,Y) 

i=l 

K s t 

= t Y / I(^;U,Xl~\Y\Y i ) 

i=l 

K 3 t 

1 i=i 

i=l 

> k, s Rx\y($i) 

> K s R x \ Y {d\ + e) , 



(182) 

(183) 
(184) 
(185) 

(186) 
(187) 
(188) 

(189) 

(190) 
(191) 



where ( 184 1 follows from the data-processing inequality, ( 187 1 follows because (X,Y) is iid, ( 188 1 
follows from the data-processing inequality and the fact that X{ is a function of (U, Y), ( |189 1 follows 
from the definition of the conditional rate-distortion function where d x ^ = E5i(X;, Xi), and ( |190| ) 



combines Jensen's inequality and the convexity of R x \Y{d\) in d\, and (191) follows because R x \Y(di + 
e) non-increasing in d\. Similarly, it can be shown that 

k c I(W; V) > K s R Ylx (d 2 + e) . (192) 

The theorem follows from the continuity of R x \ Y {d\) and Ry\ x {d2) on IR+ and the fact that e > is 
arbitrary. ■ 



B. Achievability of Theorem [5] 

We now adapt an achievability result of Nayak, Tuncel and Giindiiz (271 to give a sufficient condition 
for joint source-channel coding. When combined with Theorem [5j this condition will give necessary and 
sufficient conditions for joint source-channel coding of jointly Gaussian random variables with squared- 
error distortion measures. 
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Lemma 5 (Cor. 1 /[27]/): Let ^ be a finite set. A distortion pair (d\,d2) € is admissible with 
bandwidth expansion k if the following conditions are satisfied: 

(i) there exist random variables W on W and C on c &\ 

(ii) there exist functions tt 1 : x & ->■ if and 7r 2 : *rf x # with 

E[5i(X,7ri(C,F))] < di (193a) 

E[5 2 (F,7r 2 (C,X))] <d 2 ; (193b) 

(iii) the following inequalities hold 

I{X; C\Y) < kI(W; U) (194a) 

I{Y- C\X) < kI(W; V) . (194b) 

Lemma [5] is the joint source-channel coding extension of the one-description upper bound given in 
Lemma [T] The lemma is actually a special case of a stronger result [27 , Thm. 1]; however, this weaker 



result will suffice for the following discussion. Note also the Markov constraints in |27 Cor.l] do not 
play a role here as the side-information is available to the transmitter. 

The next two corollaries combine Theorem [5] and Lemma [5] to give necessary and sufficient conditions 
for the following two special cases: (i) the source qxy has zero-rate loss in the Wyner-Ziv problem, and 
(ii) one source has to be reconstructed vanishing Hamming distortion. 

Corollary 8.4: If qxy has zero rate-loss in the Wyner-Ziv problem (i.e., R x \Y{di) = B^^{d\) and 
R Y \x{d2) = Ry^lx^)), then (di,^) £ is achievable with bandwidth expansion k if and only if 



there exists a pmf pw on W such that (49) holds. 



As discussed before, the zero Wyner-Ziv rate-loss condition is very restrictive and few sources are 
known to satisfy it. However, an interesting example that does satisfy this condition is given next. 

Example 5: If (X, Y) are jointly Gaussian random variables 5\ and J 2 are squared error distortion 
measures ( [69] ) (see Example |2j), then (c?i,d 2 ) € is achievable with bandwidth expansion k if and 
only if there exists a pmf pw on W such that ( |49"] ) holds. The conditional RD functions R x \Y(di) and 
R Y \x{d2) are given in ( f72] ). 

Corollary 8.5: If 5\ is a Hamming distortion measure, then (0, <i 2 ) is achievable with bandwidth 
expansion k if and only if there exists a pmf py/ on W such that 

H(X\Y) < kI(W; U) and (195a) 
R Y \x(d2) <kI(W-V) . (195b) 

Proof of Corollary \8.4\ The necessary condition ("only if") is given by Theorem [5] The sufficient 
condition ("if") is proved by constructing an auxiliary random variable C that meets the conditions of 
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LemmaBlwith I{X;C\Y) = R x \ Y (di) and I(Y;C\X) = R Y \ x {d 2 ). Recall that 



Rx\v(di) = mm I(X; A\Y) , and (196) 
RW z (d 2 )= min I(Y;B\X) . (197) 



Let p' and p" be joint pmfs on x x ^ and f xl"xf that achieve the aforementioned minima. 
Let p be the joint pmf on^/x^xJTx^ defined by 

^MV^I'^S^- ««r(x,,»0, (198) 
[ 0, otherwise. 

By construction, the (A,X,Y) and (B,X,Y) marginals of p are p l and p", and p satisfies the chain 
A e- (X, F) e- B. Recall that p' satisfies the chain A -e- X -e- Y, and p" satisfies the chain B © 7 © I. 
Combining these chains yields the long chain A-& X -e-Y ■& B. 

Set 'it? = x £3 and C = (A,B). Note that C is a valid auxiliary random variable for Lemma [5] 
Moreover, we have 

I(X;C\Y) = I(X;A,B\Y) (199) 
= I(X;A\Y) + I(X;B\A,Y) (200) 
= I(X;A\Y) (201) 
= Rx\ Y (di) (202) 
= i? X |y(dl), (203) 
where ( |20 1 j > follows because A e- X -e- Y -e- S implies X e- (^4,1^) -B, ((202 1 follows because p' is 



an optimal test channel for the Wyner-Ziv RD function, and ( |203[ ) follows by assumption. Similarly, we 
have I(Y; C\X) = R Y \ x (d 2 ). * 
Proof of Corollary \8.5\ The necessary condition ("only if") is follows from Theorem [5] and 
R X \y(Q) = H(X\Y). The sufficient condition ("if") is proved by constructing an auxiliary random 
variable C that meets the conditions of Lemma |5] as well as I(X;C\Y) = R X \y(Q) = H(X\Y) and 
I(Y;C\X) = R Ylx (d 2 ). 



Recall the joint pmf of (X,Y, X ,Y) used to prove Corollary 8.1 Choose C = (X, Y) and note this 
choice of C meets the conditions of Lemmaji] As before, we also have that I(X; C\Y) = I(X; X, Y\Y) = 
H(X\Y) and I(Y; C\X) = I(Y; X, Y\X) = R Y \x(d 2 ). ■ 

C. Proof of Theorem |4] 

The sufficient condition is a special case of Lemma [5] We now give the necessary condition. If a 
distortion pair (d±, d 2 ) G Ml is achievable with bandwidth expansion k = k c /k s , then by definition there 
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exists for every e > a CR-JSC code (j( t \gf\g$\<$\<$ ) ) with 

st) <di + e , (204) 

as well as 

Pr {(j>f (V, X) ^ gf (U, Y)] < e t and (205) 
Pr [4>f (U, Y) + gf (V, X)] < e , (206) 

As in the proof of Theorem [5} let W = /^(X, Y), let pyy. denote the pmf for the I th symbol Wu and 
define the time shared random variable W on W via 

Pw( w ) - — + y2,pwX w ) ■ ( 2 ° 7 ) 

Kci i=i 

We will show that 

k c I(W;U) > k 3 I(X;X,Y\Y) and (208a) 

k c I(W; V) > k s I(Y; X, Y\X) (208b) 

for some test channel p x f\XY e ^xy\xy^\i ^2)- 

The next inequality, which will be useful later, follows from Fano's inequality |TT] and ( |206| ): 



where 



e( Ks ,t,e) > jH[g^(V,X)\4^(U,Y)) , (209) 



e(K s ,t,e)±h(e) + eK s log 2 \£\\&\ . (210) 



We first invoke the techniques used in the converse proof of Theorem [5J specifically, we have 

t 



k c I(W;U) > -^TliW^Uj 
1=1 

> i/(W;U) 

> i/(X,Y;U) 

> i/(X;U|Y) . 
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We now invoke the techniques used in the converse proof of Theorem [T] Specifically, we have 
KcI(W; U) > i/(X;U|Y) 



l(X;V,g?(V,Y),4\v,Y)\Y) 
/(X; U, gf (U, Y)>? (U, Y),g$p (V, X) | Y) 

- J(X; g® (V, X) | Y, U, g® (U, Y) , rf } (U, Y)) 



1 



> jl(X;g®{U,Y),g$>(y,X)\Y) -e(s a ,t,e) 



= - £ 5 f ) (U, Y) , 5 f (V, X) | Y, Xf 1 ) - e (« s , i, e) 
i=l 

= (U,Y),^(V,X) 1 Jrr 1 ,Y?'- 1 ,l75'«|Y i ) - e («.,t,e) 

> -^/(X^fO^Y),^ (V,X)|^) -£(«., i,e) , 



(211) 
(212) 

(213) 
(214) 
(215) 

(216) 
(217) 
(218) 



j'=i 



where ( |2T2] > follows because X e (U, Y) © (^(U, Y), ^(U, Y)) forms a Markov chain, and ( [214] ) 



follows from (209 1 and 



e(«.,t,€) > ^( 3 f(V,X)|^(U,Y)) 



> ^/(X; 5 W(V,X)|Y,U,^(U,Y),^(U,Y)) . 
A similar procedure yields 



(219) 
(220) 

(221) 



For z = 1,2,..., K s i, let X{ and denote the i th symbols of gf\\J, Y) and g^\~V, X), respectively. 
Let pjf y \x- Y yj\ x 3> yj) denote the conditional probability of (Xj,Yj) given (Xj,Yj), and define 

K s t 

(222) 

■"J J J |-" 1 



1 



s i=i 

The average distortion requirement on the code guarantees 

XI Pxy|xy( £ >yl x >y)^H x ' < d x + e and 

x,y,x,y 



(223) 
(224) 



x,y,x,y 
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We further have 



j J2 /(.V ; :.V ; . V ; V/) > k s I(X; X, Y\Y) , and (225) 
- ^2 l{Yf, Xj^Xj) > k s I(Y; X, Y\X) , (226) 



where we have used Jensen's inequality together with the convexity of I(X; X, Y\Y) and I(Y; X, Y\X) 
in V±y\xv Thus, we have shown that there exists a condition pmf P xy \xy (x, y\x, y) and a pmf p^ 
such that 

k c I(W; U) > k s I(X; X, Y\Y) - e(n s ,t, e) and (227) 
kJ{W- V) > k s I(Y; X, Y\X) - e( Ks ,t, e) . (228) 

D. Proof of Theorem [6] 

The necessary condition follows from Theorem [5] We now show that this necessary condition is 
also sufficient for small distortions. From Theorem BJ a sufficient condition for (di,^) G to be 



achievable is that there exists a pmf p\y on W and Vxy\xy e ^xy\xy sucn tnat n °lcls. 
Choose P X y\xy e ^xyixy^^ 1 ' to ac hieve the minimum in the definition of the joint RD function 
RxY{di,d,2). In a similar manner to the proof of Corollary |8.3| we have that 



I(X; X, Y\Y) < RxY(di, d 2 ) - R Y {d 2 ) and (229a) 
I(Y;X,Y\X) < Rx Y (di,d 2 )-R x (d 1 ) . (229b) 



From Gray [19 Thm. 3.2], there exists a strictly positive surface £F in such that R x \Y(di) = 
RxY{di,d 2 ) — RY{d 2 ) and RY\x{d 2 ) = Rxv(di,d 2 ) — Rx{d\) whenever (d\,d 2 ) lies on or below Qi. 
For these small distortions, we have that I(X;X,Y\Y) = R x \Y(di) and I(Y;X,Y\X) = R Y \x(d 2 ). 

VI. Conclusion 

The downlink broadcast channel of the two-way relay network was studied in the source coding 
and joint source-channel coding settings. Single-letter necessary and sufficient conditions for reliable 
communication were given for the following special cases: common-reconstructions (Theorems [T] and [4]), 
small distortions (Theorems [2] and [6]), conditionally independent sources (Corollary |7.1[ ), deterministic 
distortion measures (Corollary |7.2| ), and sources with zero rate-loss for the Wyner-Ziv problem [9]. 
Additionally, the notion of small distortions was explicitly characterised for the doubly symmetric binary 
source with Hamming distortion measures in Theorem [3] Each of the aforementioned results followed, 
in part, from the necessary conditions presented in Theorems [5] and [7] It remains to be verified that these 
necessary conditions are, or are not, sufficient. 
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More generally, the source coding problem is a special case of the Wyner-Ziv problem with two 



receivers [36], |37|, and the joint source-channel coding problem is a special case of the Wyner-Ziv 



coding over broadcast channels problem [27|. It would be interesting to see if the small distortion results 
in this paper carry over to these problems. 
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Appendix A 
An Improved Lower Bound for R(d±,d 2 ) 

In this section, we present an alternative to the cut-set lower bound RL{d\,di) given in Theorem [7] 
(see the end of Section IV-Ai. For this purpose, let srf ', and ^ be finite alphabets of cardinality 

M <\3C\ \&\ + 5 , (230a) 

W\ < \X\ \&\ |^|+2 and (230b) 

\SS\ < \3C\ \<&\ +2 . (230c) 

The new lower bound will be obtained by minimizing a certain function over the following set of joint 
pmfs. Let &* L {d\,d2) denote the set of pmfs pon^/ x^x^xJTx^ where 

(i) A e- (X, Y,C) -e- B forms a Markov chain, i.e., 

I(A;B\X,Y,C) = , (231) 

(ii) (A,B) is independent of (X, Y), i.e., 

I(X,Y;A,B) = , 



(iii) there exist functions 



TTi :£/ x <€ x & -> SC and 
TT2 :3S x <T x 3C -> # 



such that 



E, 



E., 



5 1 (X,7T 1 (A,C,Y)) 
5 2 (Y,7r 2 (B,C,X)) 



< d\, and 

< d 2 . 



Define 



R* L {di,d 2 ) 



mm 



p£.9>* L (d u d 2 ) 



I(X,Y;C) +max{/(X;^|C7,y), I(Y;B\C,X)\ 



(232) 

(233a) 
(233b) 

(234a) 
(234b) 

(235) 
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The next theorem gives a lower bound for R(d\,d 2 ). 
Theorem 9: For (cii, d 2 ) £ we have that 

R(di,d 2 ) > R* L (di,d 2 ) > R L (di,d 2 ) . (236) 



A. Proof: R{d 1 ,d 2 ) > i%(cZi,d 2 ) 

If r is (d\, ^-admissible, then there exists a monotonically decreasing sequence {e^} with limit zero; 
a monotonically increasing sequence {rii}; and a sequence of RD codes {(f^ ni \gi ,g 2 )} such that 

k (th) < r + 6i, A^ n,) < d x + £i and A^ n,) < d 2 + ej. Then we have 

r+e > — log 2 I 



1 
1 



-il(M) 



> -J(X,Y;M) 
= — [/(X;M|Y) +/(Y;M) 

= i£[l(X J -;M|Xr 1 ,Y) + I(Y,;M|Y/- 1 ) 



= - E [W; M ' y i) + ^ ^rS^iiM, + i(Yf,M) 

-. rii 

= - E fe> y i; M ) + yi~\y]+i\M, y 3 



(237) 
(238) 
(239) 
(240) 

(241) 
(242) 
(243) 
(244) 
(245) 



where (237 1 follows from the definition of a (d\, d 2 ) -admissible rate, (238 1 through (241 1 follow from 



standard identities, ( 242 > follows because (X, Y) is i.i.d., ( 243 1 through ( 245 ) follows from standard 
identities. In a similar manner, it can also be shown that 



ni j=i 



(246) 



For j = 1,2,..., m, define a(j = & n *-\ SBj = 



ru-l 



L , and ^ = ^#( n '). We consider {^}, 
j = 1, 2, . . . , m, to be a class of disjoint sets. Similarly, we consider {s^j} and {33 j} to be disjoint sets. 
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Now define 



-3~ l v ni 

1 » 

rj—l V Ui 



B^iXf^X^) and 
Cj = M , 



(247a) 
(247b) 
(247c) 



Let p* denote the resultant joint pmf on srfj x 3§ 3 - x x SC x ^ that characterises the random 
variables A,, Cj, Xj and Yj. By construction, we have 

1) (Xj,Yj) is independent of (Aj,Bj), i.e. 

I p .{X j ,Y f ,A j ,B j ) = I{X j ,Y f ,Yi-\Y? +1 ,X{-\x? +1 ) = , (248) 

2) there exists a function 7r X) j : ^ x x *3f St such that Xj = Tr x j(Aj,Cj,Yj), 

3) there exists a function ir y j : Jj x ^- x f -> f such that Yj = 7T y j(Bj, Cj,Xj). 
Now define ^ = Uj-s/j, SB = Uj^j and <*f = U^-, and the "time-shared" pmf 

— p^(a,b,c,x,y) , if a G 
p*(x,y,a, b, c) = < 6e% ce^j, 

, otherwise. 
on^xJx^xJfx!^. Using this definition, it can be verified that 

1 " 



I p * (X,Y;C) — I P * {Xj , Yj ; Cj ) 

>H ■ - 



3=1 



1 ra - 

I r {X-A\C,Y) = -Y,Ip^{X j -A j \C j ,Y j ) 
I r (X;A\C,Y) = -Y,I P *{X j -A j \C j ,Y j ) 

Ui 3=1 

/„. (X, Y; A, B) = y i5 A i> B i) = 



Furthermore, by definition, we have 

di + ei > Ai' 
1 



(«<) 



-^Ep.^^-,^^,^,^)) 

J'=l 

= E r [«5 1 (X,vr 1 (AC , ,y)] , 
where the last expectation is taken with respect to p*, and 7Ti : srf x ^ x ^ — ^ is defined by 

A f 7ri,j(o,c,y) if a G 
I x* , otherwise, 



7Ti(a,c, y) 



(249a) 
(249b) 
(249c) 
(249d) 

(250) 
(251) 

(252) 
(253) 
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where x* S Sfc is arbitrary. Similarly, 

d 2 + e i >E[5 2 (Y,7T 2 (B,C,X))] 

At this point we have that 

r + €i > inf 



(254) 



(255) 



I r (X, Y; C) + max {l p , (X; A\C, Y),I r (F; B\C,X)}\ , 
where the infimum is taken over all p* satisfying I p * (X, F; A,B) = as well as 

di + €i > E p . [Si (X, TTi (A, C, Y ))] and (256a) 
d 2 + e, > Ep. [<5 a (y, vr 2 (S, C, X))] . (256b) 
Note the this infimum is not altered if we impose the Markov chain A e- (X, Y, C) e- -B. Finally, we apply 



the support lemma (38J to bound the cardinality of <& by | %\ \ ( 3/\ +5, and and # by | SC\ \<&\ \^\ + 2. 
(\s^\ and \38\ can be bounded simultaneously since A e- (X, Y, C) -e- B forms a Markov chain.) 



B. Proof: R*(d u d 2 ) > R L (di,d 2 ) 

Enlarge the set &* L {d\, d 2 ) by removing the constraints I(X, Y; A, B) = and A e- (X, F, C) -e- B 
\p\. Denote this new set by £P' L (di, d 2 )- Then, 



mm 

p&^l{d u d 2 ) 



> min 



mm max 

P e&>t(d x ,d 2 ) 



I(X, F; C) + max |/(X; A|C, F), I(Y; £|C, X)} 

/(X, F; C) + max {/(X; A|C, F), J(F; B|C, X)} 

{/(X, F; C) + /(X; A|C, F), /(X, F; C) + I(F; £|C, X)} 
/(X;AC|F), 7(F ;j B,C|X)} 



> min max 

P e^i(di,d 2 ) 



min max.{l(X;A,C,Y\Y), I(Y; B,C, X\X)\ 
p&3?>l{d u d 2 ) < > 



> min max 

P &.^t(d 1 ,d 2 ) 



{/(X; tt x (A, C, F)|F), 7(F; % (B, C, X)|X)} 
> max|i?x|y(rfi), ^y|x(<fe)} 



(257) 
(258) 
(259) 
(260) 

(261) 

(262) 
(263) 



= R L (di,d 2 ) . 

where ( |257| > follows because ^(di, (f 2 ) C ^ L (di,d 2 ), ( |259| ) and ( |260| ) follow from the chain rule for 
mutual information, ( 26 1 1 > follows the data processing inequality, ( 262 1 follows from the definition of 



the conditional rate-distortion function, and ( |263[ ) follows from the definition of Ri(di,d 2 ) 
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Appendix B 

Convexity of Conditional Mutual Information 

Proof: Suppose (A, B) is defined by a (fixed) joint pmf pab- Let Pq\ AB and Pq AB be two conditional 
pmfs for C given (A, B). For i = 1,2, let 

Pabc - Pc\AB (c\a,b) p AB (a,b) , (a, b, c) G si x 38 x <*f (264) 

denote the resulting joint pmfs. We identify the marginals of these pmfs with subscripts in the usual way; 
for example, 

P%(a,c) = YjPabc^c) > (o,c)€^x«',t = l > 2. (265) 
has 

Choose a\ and 0:2 such that < a\, < 1 and a\ + 02 = 1. Let 

Pc|A£( c l a ' ft ) - Q iPc|AB( c l a > b ) + a 2Pq j4jB (c|a, 6) . (266) 

As before, let p* ABC denote the resultant joint pmf for (A, B, C) when p* C \ AB is combined with pab- 

We wish to evaluate the conditional mutual information I(A;C\B) with respect the three conditional 
probabilitie^J Pq\ AB , Pq\ab anc ^ P*c\ab- ^ n particular, the lemma will be proved if it can be shown that 

2 

Y^^I{A-C\B)\p^\ AB \ > I(A;C\B)[p* clAB ] , (267) 
i=i 

where I(A; C\B)[p' c ^ AB ] should be understood as the conditional mutual information I(A;C\B) when 
the joint probability of (A,B,C) is defined by p AB and p'q\ AB - For this purpose, we write I(A;C\B) 
explicitly as a function of p' c \ AB - 

I(A;C\B) = y^p B {b)p A \ B {a\b)p'{c\a,b)\og ° l f B , (268) 

a,b,c Pc\By C \°l 

where the conditional probability p' c ^ B is a function of the other arguments 

J C\AB 



Pc\B( c \b) = ^PA\B( a \b)Pc\ AB ( c \ a >b) ■ (269) 



Then we have 

2 2 pS) (c\a b) 

<*iI(A; C\B) [p§ AB ] =J2J2 ^PB(b)p A \B(a\b)p§ AB (c\a, b) log ^ (270) 

i=l i=l a,b,c Pc\B\ C \b) 

^ P I (2 

= ^ Pi? ( 6 )X] a ^^l g ( Q l 6 ^c!AB( c l Q ' 6 ) lo g C (!f ( 271 ) 
b i=l a,c Pc\B\ C \b) 

> ^PB{b)^PA\B{a\b)Pc\ AB {c\a, b) log 5 ' j (272) 
= I(A;C|B) [^ |AB ] , (273) 
7 Note that p^(a, 6) = -Pls( a , & ) = Pas (a, &) 



November 23, 2010 



DRAFT 



45 



where the inequality follows from the convexity of mutual information in the channel for a fixed input 
distribution JTT). ■ 
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